A workshop in reproducible research for the R novice
Session 1
Richard Layton
Rose-Hulman Institute of Technology
2016-08-24
How complete is your homework? Find a partner at the same level
Get connected
Introductions
Handouts
Write down your ideas in response to Mystery question 1:
What is reproducible research?
Research is reproducible when the data and the code used to obtain a finding are available and sufficient for an independent researcher to recreate the finding.
computational, data-intensive
spans the full data, analysis, & publication workflow
most of us have received only perfunctory training (if any)
Christopher Gandrud, Reproducible Research with R and RStudio, 2/e, CRC Press, 2015.
More accountability is needed because of
the primary findings were false. The major effect disappeared after correcting for
coding errors
selective exclusion of available data
unconventional weighting of summary statistics
data were falsified to obtain the research outcomes he wanted, resulting in
retracted journal articles (11 to date)
terminated clinical trials
cancelled research funding
civil suit by patients
Ivan Oransky, It’s official: Anil Potti faked cancer research data, say Feds, Retraction Watch, 2015-11-07.
Scientists and skeptics are in a knife fight, and you don’t bring data to a knife fight.
— Paul Erlich
Why should I make the data available to you, when your aim is to try and find something wrong with it?
— Phil Jones
Brad Keyes, Mann retirement: Analysis, reax, Climate Sceptic, 2016-05-08.
Jeff Leek, De-weaponizing reproducibility, 2015-03-13.
If you do anything “by hand”" once, you’ll do it 100 times.
— Paul Wilson, UW–Madison
Your closest collaborator is you, six months ago. Have you tried to email that slacker?
— Karl Broman, UW–Madison
To preserve sanity, stop collaborating via email, attachments, and tracking changes in Word.
— Jenny Bryan, UBC
Write scripts (avoid manual copy, paste, mouse-clicks)
Plan the organization and naming scheme for files
Strive for simplicity, readability, reusability, and testability
Agree on a workflow for collaborating before starting a manuscript
DRY (don’t repeat yourself)
Link files explicitly
Plan data management
Postpone optimization
Use version control
License your software
Jenny Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, Tracy Teal, and Greg Wilson, Good enough practices for scientific computing, 2016-01.
Write scripts (avoid manual copy, paste, mouse-clicks)
Plan the organization and naming scheme for files
DRY (don’t repeat yourself)
Link files explicitly
Describe the problems that reproducibility helps solve
Identify non-reproducible practices in their current workflow
List two basic principles of reproducible research
Organize directories and files for reproducibility
Create a reproducible report using R and RStudio
Imagine that you were the author of the “Load cell calibration report”
Carefully review the report and answer Mystery question 2:
Identify as many “manual operations”
as possible.
Tutorials to create a dynamic report
Homework:
Continue the tutorials as far as you wish
If you want to start your own reproducible project
Mystery question 3 (turn this one in to me)
What was the muddiest point in the workshop so far?